NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Metadata Enhancement Using Large Language Models

Song, Hyunju; Bethard, Steven; Thomer, Andrea K (August 2024, Association for Computational Linguistics)

Full Text Available
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations

Ki, Dayeon; Carpuat, Marine; Duh, Kevin; Gomez, Helena; Bethard, Steven (June 2024, Association for Computational Linguistics)

Machine Translation (MT) remains one of the last NLP tasks where large language models (LLMs) have not yet replaced dedicated supervised systems. This work exploits the complementary strengths of LLMs and supervised MT by guiding LLMs to automatically post-edit MT with external feedback on its quality, derived from Multidimensional Quality Metric (MQM) annotations. Working with LLaMA-2 models, we consider prompting strategies varying the nature of feedback provided and then fine-tune the LLM to improve its ability to exploit the provided guidance. Through experiments on Chinese-English, English-German, and English-Russian MQM data, we demonstrate that prompting LLMs to post-edit MT improves TER, BLEU and COMET scores, although the benefits of fine-grained feedback are not clear. Fine-tuning helps integrate fine-grained feedback more effectively and further improves translation quality based on both automatic and human evaluation.
more » « less
Full Text Available
Toward NEPA performance: A framework for assessing EIAs

https://doi.org/10.1016/j.eiar.2022.106879

Emerson, Kirk; Baldwin, Elizabeth; Scott, Tyler A.; Pidot, Justin R.; Lien, Aaron M.; Currim, Faiz; Bethard, Steven; Ram, Sudha; Miller, Marc L.; López-Hoffman, Laura (November 2022, Environmental Impact Assessment Review)

Full Text Available
If You Want to Go Far Go Together: Unsupervised Joint Candidate Evidence Retrieval for Multi-hop Question Answering

https://doi.org/10.18653/v1/2021.naacl-main.363

Yadav, Vikas; Bethard, Steven; Surdeanu, Mihai (January 2021, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Full Text Available
Explainable Multi-hop Verbal Reasoning Through Internal Monologue

https://doi.org/10.18653/v1/2021.naacl-main.97

Liang, Zhengzhong; Bethard, Steven; Surdeanu, Mihai (January 2021, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Full Text Available
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering

https://doi.org/10.18653/v1/2020.acl-main.414

Yadav, Vikas; Bethard, Steven; Surdeanu, Mihai (January 2020, Association for Computational Linguistics)

Full Text Available
Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering

https://doi.org/10.18653/v1/D19-1260

Yadav, Vikas; Bethard, Steven; Surdeanu, Mihai (January 2019, Association for Computational Linguistics)

Full Text Available
Addressing structural hurdles for metadata extraction from environmental impact statements

https://doi.org/10.1002/asi.24809

Laparra, Egoitz; Binford‐Walsh, Alex; Emerson, Kirk; Miller, Marc_L; López‐Hoffman, Laura; Currim, Faiz; Bethard, Steven (June 2023, Journal of the Association for Information Science and Technology)

Abstract Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi‐file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real‐world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections.
more » « less
Inferring missing metadata from environmental policy texts

https://doi.org/10.18653/v1/W19-2506

Bethard, Steven; Laparra, Egoitz; Wang, Sophia; Zhao, Yiyun; Al-Ghezi, Ragheb; Lien, Aaron; López-Hoffman, Laura (June 2019, Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature)

The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmental policy by extracting and organizing metadata from the text of NEPA documents. Our contributions include collecting more than 40,000 NEPA-related documents, and evaluating rule-based baselines that establish the difficulty of three important tasks: identifying lead agencies, aligning document versions, and detecting reused text.
more » « less
Full Text Available
Eidos, INDRA, & Delphi: From Free Text to Executable Causal Models

https://doi.org/10.18653/v1/N19-4008

Sharp, Rebecca; Pyarelal, Adarsh; Gyori, Benjamin; Alcock, Keith; Laparra, Egoitz; Valenzuela-Escárcega, Marco A.; Nagesh, Ajay; Yadav, Vikas; Bachman, John; Tang, Zheng; et al (June 2019, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations))
null (Ed.)
Full Text Available

Search for: All records